On Semi-Automated Web Taxonomy Construction

نویسندگان

  • Ravi Kumar
  • Prabhakar Raghavan
  • Sridhar Rajagopalan
  • Andrew Tomkins
چکیده

The subject of this paper is the semi-automatic construction of taxonomies over the Web. We address the problem of discovering high-quality resources that belong in a particular node of a taxonomy. We show that minimal additional effort is required to provide relevance feedback in a hyperlinked environment, resulting in significant and consistent improvement in quality. Furthermore, this feedback is especially valuable for topics for which it is more difficult to find high-quality pages. Enroute, we describe novel algorithms for hyperlink relevance feedback.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TaxaMiner: an experimentation framework for automated taxonomy bootstrapping

Hierarchical taxonomies and thesauri are frequently used by content management systems for indexing, search and categorization. They are also being viewed as rudimentary ontologies for the emerging Semantic Web infrastructure. However, to date, development of taxonomies and thesauri are human intensive processes, requiring huge resources in terms of cost and time. It is critical that approaches...

متن کامل

Visual divisive hierarchical clustering using k-means

This paper presents a browser-based semi-automatic taxonomy construction tool Vd-chuck which is able to incorporate text and data mining algorithms into a userfriendly interface. The presented system is browserbased. Its unsupervised learning for concept suggestion and different visualization techniques assist the user with textual and numerical data analysis. We tested the Vdchuck system on a ...

متن کامل

A Graph-Based Algorithm for Inducing Lexical Taxonomies from Scratch

In this paper we present a graph-based approach aimed at learning a lexical taxonomy automatically starting from a domain corpus and the Web. Unlike many taxonomy learning approaches in the literature, our novel algorithm learns both concepts and relations entirely from scratch via the automated extraction of terms, definitions and hypernyms. This results in a very dense, cyclic and possibly di...

متن کامل

METEOR–S WSDI: A Scalable Infrastructure of Registries for Semantic Publication and Discovery of Web Services

Web services are the new paradigm for distributed computing. They have much to offer towards interoperability of applications and integration of large scale distributed systems. To make Web services accessible to users, service providers use Web service registries to publish them. Current infrastructure of registries requires replication of all Web service publications in all Universal Business...

متن کامل

Bootstrapping Information Extraction from Semi-structured Web Pages

We consider the problem of extracting structured records from semi-structured web pages with no human supervision required for each target web site. Previous work on this problem has either required significant human effort for each target site or used brittle heuristics to identify semantic data types. Our method only requires annotation for a few pages from a few sites in the target domain. T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001